Overview

Dataset statistics

Number of variables34
Number of observations2509
Missing cells10929
Missing cells (%)12.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory666.6 KiB
Average record size in memory272.1 B

Variable types

CAT22
NUM9
BOOL3

Warnings

Sex has constant value "2509" Constant
Relapse Free Status (Months) is highly correlated with Overall Survival (Months)High correlation
Overall Survival (Months) is highly correlated with Relapse Free Status (Months)High correlation
Tumor Other Histologic Subtype is highly correlated with Cancer Type Detailed and 1 other fieldsHigh correlation
Cancer Type Detailed is highly correlated with Tumor Other Histologic Subtype and 1 other fieldsHigh correlation
Oncotree Code is highly correlated with Cancer Type Detailed and 1 other fieldsHigh correlation
Patient's Vital Status is highly correlated with Overall Survival StatusHigh correlation
Overall Survival Status is highly correlated with Patient's Vital StatusHigh correlation
Type of Breast Surgery has 554 (22.1%) missing values Missing
Cellularity has 592 (23.6%) missing values Missing
Chemotherapy has 529 (21.1%) missing values Missing
Pam50 + Claudin-low subtype has 529 (21.1%) missing values Missing
ER status measured by IHC has 83 (3.3%) missing values Missing
ER Status has 40 (1.6%) missing values Missing
Neoplasm Histologic Grade has 121 (4.8%) missing values Missing
HER2 status measured by SNP6 has 529 (21.1%) missing values Missing
HER2 Status has 529 (21.1%) missing values Missing
Tumor Other Histologic Subtype has 135 (5.4%) missing values Missing
Hormone Therapy has 529 (21.1%) missing values Missing
Inferred Menopausal State has 529 (21.1%) missing values Missing
Integrative Cluster has 529 (21.1%) missing values Missing
Primary Tumor Laterality has 639 (25.5%) missing values Missing
Lymph nodes examined positive has 266 (10.6%) missing values Missing
Mutation Count has 152 (6.1%) missing values Missing
Nottingham prognostic index has 222 (8.8%) missing values Missing
Overall Survival (Months) has 528 (21.0%) missing values Missing
Overall Survival Status has 528 (21.0%) missing values Missing
PR Status has 529 (21.1%) missing values Missing
Radio Therapy has 529 (21.1%) missing values Missing
Relapse Free Status (Months) has 121 (4.8%) missing values Missing
3-Gene classifier subtype has 745 (29.7%) missing values Missing
Tumor Size has 149 (5.9%) missing values Missing
Tumor Stage has 721 (28.7%) missing values Missing
Patient's Vital Status has 529 (21.1%) missing values Missing
Patient ID has unique values Unique
Lymph nodes examined positive has 1196 (47.7%) zeros Zeros

Reproduction

Analysis started2022-07-02 13:54:06.565111
Analysis finished2022-07-02 13:54:37.709955
Duration31.14 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

Patient ID
Categorical

UNIQUE

Distinct2509
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size19.6 KiB
MB-0000
 
1
MB-6195
 
1
MB-6185
 
1
MB-6187
 
1
MB-6188
 
1
Other values (2504)
2504 
ValueCountFrequency (%) 
MB-00001< 0.1%
 
MB-61951< 0.1%
 
MB-61851< 0.1%
 
MB-61871< 0.1%
 
MB-61881< 0.1%
 
MB-61891< 0.1%
 
MB-61901< 0.1%
 
MB-61921< 0.1%
 
MB-61941< 0.1%
 
MB-62001< 0.1%
 
Other values (2499)249999.6%
 
2022-07-02T14:54:37.834954image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique2509 ?
Unique (%)100.0%
2022-07-02T14:54:37.995002image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length9
Median length7
Mean length7.417696293
Min length7

Age at Diagnosis
Real number (ℝ≥0)

Distinct1843
Distinct (%)73.8%
Missing11
Missing (%)0.4%
Infinite0
Infinite (%)0.0%
Mean60.42030024
Minimum21.93
Maximum96.29
Zeros0
Zeros (%)0.0%
Memory size19.6 KiB
2022-07-02T14:54:38.160953image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum21.93
5-th percentile38.1685
Q150.92
median61.11
Q370
95-th percentile81.0245
Maximum96.29
Range74.36
Interquartile range (IQR)19.08

Descriptive statistics

Standard deviation13.03299717
Coefficient of variation (CV)0.2157056009
Kurtosis-0.5817580852
Mean60.42030024
Median Absolute Deviation (MAD)9.57
Skewness-0.1552842184
Sum150929.91
Variance169.8590152
MonotocityNot monotonic
2022-07-02T14:54:38.336953image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
45110.4%
 
52110.4%
 
64100.4%
 
47100.4%
 
4390.4%
 
6090.4%
 
6580.3%
 
6380.3%
 
4980.3%
 
4680.3%
 
Other values (1833)240695.9%
 
(Missing)110.4%
 
ValueCountFrequency (%) 
21.931< 0.1%
 
26.361< 0.1%
 
26.721< 0.1%
 
271< 0.1%
 
27.561< 0.1%
 
ValueCountFrequency (%) 
96.291< 0.1%
 
92.141< 0.1%
 
90.431< 0.1%
 
90.231< 0.1%
 
90.081< 0.1%
 

Type of Breast Surgery
Categorical

MISSING

Distinct2
Distinct (%)0.1%
Missing554
Missing (%)22.1%
Memory size19.6 KiB
Mastectomy
1170 
Breast Conserving
785 
ValueCountFrequency (%) 
Mastectomy117046.6%
 
Breast Conserving78531.3%
 
(Missing)55422.1%
 
2022-07-02T14:54:38.516958image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-07-02T14:54:38.608952image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:38.705989image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length17
Median length10
Mean length10.64447987
Min length3

Cancer Type
Categorical

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size19.6 KiB
Breast Cancer
2506 
Breast Sarcoma
 
3
ValueCountFrequency (%) 
Breast Cancer250699.9%
 
Breast Sarcoma30.1%
 
2022-07-02T14:54:38.831956image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-07-02T14:54:38.916953image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:39.002953image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length14
Median length13
Mean length13.0011957
Min length13

Cancer Type Detailed
Categorical

HIGH CORRELATION

Distinct8
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size19.6 KiB
Breast Invasive Ductal Carcinoma
1865 
Breast Mixed Ductal and Lobular Carcinoma
269 
Breast Invasive Lobular Carcinoma
192 
Invasive Breast Carcinoma
 
133
Breast Invasive Mixed Mucinous Carcinoma
 
25
Other values (3)
 
25
ValueCountFrequency (%) 
Breast Invasive Ductal Carcinoma186574.3%
 
Breast Mixed Ductal and Lobular Carcinoma26910.7%
 
Breast Invasive Lobular Carcinoma1927.7%
 
Invasive Breast Carcinoma1335.3%
 
Breast Invasive Mixed Mucinous Carcinoma251.0%
 
Breast210.8%
 
Breast Angiosarcoma20.1%
 
Metaplastic Breast Cancer20.1%
 
2022-07-02T14:54:39.133953image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-07-02T14:54:39.236956image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:39.411956image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length41
Median length32
Mean length32.51654045
Min length6

Cellularity
Categorical

MISSING

Distinct3
Distinct (%)0.2%
Missing592
Missing (%)23.6%
Memory size19.6 KiB
High
965 
Moderate
737 
Low
215 
ValueCountFrequency (%) 
High96538.5%
 
Moderate73729.4%
 
Low2158.6%
 
(Missing)59223.6%
 
2022-07-02T14:54:39.559953image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-07-02T14:54:39.658962image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:39.755953image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length8
Median length4
Mean length4.853328019
Min length3

Chemotherapy
Boolean

MISSING

Distinct2
Distinct (%)0.1%
Missing529
Missing (%)21.1%
Memory size19.6 KiB
No
1568 
Yes
412 
(Missing)
529 
ValueCountFrequency (%) 
No156862.5%
 
Yes41216.4%
 
(Missing)52921.1%
 
2022-07-02T14:54:39.842954image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pam50 + Claudin-low subtype
Categorical

MISSING

Distinct7
Distinct (%)0.4%
Missing529
Missing (%)21.1%
Memory size19.6 KiB
LumA
700 
LumB
475 
Her2
224 
claudin-low
218 
Basal
209 
Other values (2)
154 
ValueCountFrequency (%) 
LumA70027.9%
 
LumB47518.9%
 
Her22248.9%
 
claudin-low2188.7%
 
Basal2098.3%
 
Normal1485.9%
 
NC60.2%
 
(Missing)52921.1%
 
2022-07-02T14:54:39.940954image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-07-02T14:54:40.038952image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:40.163953image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length11
Median length4
Mean length4.593862096
Min length2

Cohort
Real number (ℝ≥0)

Distinct9
Distinct (%)0.4%
Missing11
Missing (%)0.4%
Infinite0
Infinite (%)0.0%
Mean2.900320256
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Memory size19.6 KiB
2022-07-02T14:54:40.275954image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median3
Q34
95-th percentile7
Maximum9
Range8
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.962216157
Coefficient of variation (CV)0.6765515474
Kurtosis1.198078033
Mean2.900320256
Median Absolute Deviation (MAD)1
Skewness1.241724011
Sum7245
Variance3.850292248
MonotocityNot monotonic
2022-07-02T14:54:40.389952image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%) 
180932.2%
 
376330.4%
 
228811.5%
 
42389.5%
 
51706.8%
 
71054.2%
 
8823.3%
 
9401.6%
 
630.1%
 
(Missing)110.4%
 
ValueCountFrequency (%) 
180932.2%
 
228811.5%
 
376330.4%
 
42389.5%
 
51706.8%
 
ValueCountFrequency (%) 
9401.6%
 
8823.3%
 
71054.2%
 
630.1%
 
51706.8%
 

ER status measured by IHC
Categorical

MISSING

Distinct2
Distinct (%)0.1%
Missing83
Missing (%)3.3%
Memory size19.6 KiB
Positve
1817 
Negative
609 
ValueCountFrequency (%) 
Positve181772.4%
 
Negative60924.3%
 
(Missing)833.3%
 
2022-07-02T14:54:40.533956image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-07-02T14:54:40.620956image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:40.705956image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length8
Median length7
Mean length7.110402551
Min length3

ER Status
Categorical

MISSING

Distinct2
Distinct (%)0.1%
Missing40
Missing (%)1.6%
Memory size19.6 KiB
Positive
1825 
Negative
644 
ValueCountFrequency (%) 
Positive182572.7%
 
Negative64425.7%
 
(Missing)401.6%
 
2022-07-02T14:54:40.830986image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-07-02T14:54:40.915952image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:41.000956image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length8
Median length8
Mean length7.920286967
Min length3

Neoplasm Histologic Grade
Categorical

MISSING

Distinct3
Distinct (%)0.1%
Missing121
Missing (%)4.8%
Memory size19.6 KiB
3
1198 
2
976 
1
214 
ValueCountFrequency (%) 
3119847.7%
 
297638.9%
 
12148.5%
 
(Missing)1214.8%
 
2022-07-02T14:54:41.132956image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-07-02T14:54:41.229954image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:41.342956image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length3
Median length3
Mean length3
Min length3
Distinct4
Distinct (%)0.2%
Missing529
Missing (%)21.1%
Memory size19.6 KiB
Neutral
1436 
Gain
438 
Loss
 
101
Undef
 
5
ValueCountFrequency (%) 
Neutral143657.2%
 
Gain43817.5%
 
Loss1014.0%
 
Undef50.2%
 
(Missing)52921.1%
 
2022-07-02T14:54:41.577953image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-07-02T14:54:41.679952image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:41.848955image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length7
Median length7
Mean length5.508170586
Min length3

HER2 Status
Categorical

MISSING

Distinct2
Distinct (%)0.1%
Missing529
Missing (%)21.1%
Memory size19.6 KiB
Negative
1733 
Positive
247 
ValueCountFrequency (%) 
Negative173369.1%
 
Positive2479.8%
 
(Missing)52921.1%
 
2022-07-02T14:54:42.005954image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-07-02T14:54:42.338970image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:42.431985image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length8
Median length8
Mean length6.945795138
Min length3

Tumor Other Histologic Subtype
Categorical

HIGH CORRELATION
MISSING

Distinct8
Distinct (%)0.3%
Missing135
Missing (%)5.4%
Memory size19.6 KiB
Ductal/NST
1810 
Mixed
269 
Lobular
192 
Medullary
 
32
Mucinous
 
25
Other values (3)
 
46
ValueCountFrequency (%) 
Ductal/NST181072.1%
 
Mixed26910.7%
 
Lobular1927.7%
 
Medullary321.3%
 
Mucinous251.0%
 
Tubular/ cribriform230.9%
 
Other210.8%
 
Metaplastic20.1%
 
(Missing)1355.4%
 
2022-07-02T14:54:42.569953image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-07-02T14:54:42.669954image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:42.816953image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length19
Median length10
Mean length8.86648067
Min length3

Hormone Therapy
Boolean

MISSING

Distinct2
Distinct (%)0.1%
Missing529
Missing (%)21.1%
Memory size19.6 KiB
Yes
1216 
No
764 
(Missing)
529 
ValueCountFrequency (%) 
Yes121648.5%
 
No76430.5%
 
(Missing)52921.1%
 
2022-07-02T14:54:42.912952image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Inferred Menopausal State
Categorical

MISSING

Distinct2
Distinct (%)0.1%
Missing529
Missing (%)21.1%
Memory size19.6 KiB
Post
1556 
Pre
424 
ValueCountFrequency (%) 
Post155662.0%
 
Pre42416.9%
 
(Missing)52921.1%
 
2022-07-02T14:54:42.999986image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-07-02T14:54:43.086985image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:43.175956image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length4
Median length4
Mean length3.620167397
Min length3

Integrative Cluster
Categorical

MISSING

Distinct11
Distinct (%)0.6%
Missing529
Missing (%)21.1%
Memory size19.6 KiB
8
299 
3
290 
4ER+
260 
10
226 
7
190 
Other values (6)
715 
ValueCountFrequency (%) 
829911.9%
 
329011.6%
 
4ER+26010.4%
 
102269.0%
 
71907.6%
 
51907.6%
 
91465.8%
 
11395.5%
 
6853.4%
 
4ER-833.3%
 
(Missing)52921.1%
 
2022-07-02T14:54:43.312986image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-07-02T14:54:43.447000image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length4
Median length1
Mean length1.921881228
Min length1

Primary Tumor Laterality
Categorical

MISSING

Distinct2
Distinct (%)0.1%
Missing639
Missing (%)25.5%
Memory size19.6 KiB
Left
973 
Right
897 
ValueCountFrequency (%) 
Left97338.8%
 
Right89735.8%
 
(Missing)63925.5%
 
2022-07-02T14:54:43.583987image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-07-02T14:54:43.682963image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:43.818960image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length5
Median length4
Mean length4.102829813
Min length3

Lymph nodes examined positive
Real number (ℝ≥0)

MISSING
ZEROS

Distinct32
Distinct (%)1.4%
Missing266
Missing (%)10.6%
Infinite0
Infinite (%)0.0%
Mean1.950512706
Minimum0
Maximum45
Zeros1196
Zeros (%)47.7%
Memory size19.6 KiB
2022-07-02T14:54:43.981956image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q32
95-th percentile10
Maximum45
Range45
Interquartile range (IQR)2

Descriptive statistics

Standard deviation4.017774231
Coefficient of variation (CV)2.059855452
Kurtosis20.49639915
Mean1.950512706
Median Absolute Deviation (MAD)0
Skewness3.831165202
Sum4375
Variance16.14250977
MonotocityNot monotonic
2022-07-02T14:54:44.121958image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=32)
ValueCountFrequency (%) 
0119647.7%
 
137915.1%
 
21897.5%
 
31275.1%
 
4632.5%
 
6522.1%
 
5492.0%
 
7291.2%
 
8220.9%
 
9180.7%
 
Other values (22)1194.7%
 
(Missing)26610.6%
 
ValueCountFrequency (%) 
0119647.7%
 
137915.1%
 
21897.5%
 
31275.1%
 
4632.5%
 
ValueCountFrequency (%) 
451< 0.1%
 
411< 0.1%
 
331< 0.1%
 
311< 0.1%
 
301< 0.1%
 

Mutation Count
Real number (ℝ≥0)

MISSING

Distinct32
Distinct (%)1.4%
Missing152
Missing (%)6.1%
Infinite0
Infinite (%)0.0%
Mean5.578701739
Minimum1
Maximum80
Zeros0
Zeros (%)0.0%
Memory size19.6 KiB
2022-07-02T14:54:44.274953image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median5
Q37
95-th percentile12
Maximum80
Range79
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.967966534
Coefficient of variation (CV)0.7112706001
Kurtosis63.00016081
Mean5.578701739
Median Absolute Deviation (MAD)2
Skewness4.809036039
Sum13149
Variance15.74475842
MonotocityNot monotonic
2022-07-02T14:54:44.423954image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=32)
ValueCountFrequency (%) 
532713.0%
 
432112.8%
 
330612.2%
 
629211.6%
 
225110.0%
 
72098.3%
 
11516.0%
 
81465.8%
 
91144.5%
 
10642.6%
 
Other values (22)1767.0%
 
(Missing)1526.1%
 
ValueCountFrequency (%) 
11516.0%
 
225110.0%
 
330612.2%
 
432112.8%
 
532713.0%
 
ValueCountFrequency (%) 
801< 0.1%
 
461< 0.1%
 
401< 0.1%
 
351< 0.1%
 
311< 0.1%
 

Nottingham prognostic index
Real number (ℝ≥0)

MISSING

Distinct436
Distinct (%)19.1%
Missing222
Missing (%)8.8%
Infinite0
Infinite (%)0.0%
Mean4.028786847
Minimum1
Maximum7.2
Zeros0
Zeros (%)0.0%
Memory size19.6 KiB
2022-07-02T14:54:44.592956image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2.032
Q13.048
median4.044
Q35.04
95-th percentile6.064
Maximum7.2
Range6.2
Interquartile range (IQR)1.992

Descriptive statistics

Standard deviation1.189091744
Coefficient of variation (CV)0.2951488349
Kurtosis-0.282896361
Mean4.028786847
Median Absolute Deviation (MAD)0.996
Skewness-0.07257616222
Sum9213.83552
Variance1.413939176
MonotocityNot monotonic
2022-07-02T14:54:44.759956image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
4.04873.5%
 
3.04753.0%
 
3.03612.4%
 
4.03602.4%
 
4.05582.3%
 
4.06502.0%
 
5.06481.9%
 
5.05481.9%
 
3.05431.7%
 
5.04391.6%
 
Other values (426)171868.5%
 
(Missing)2228.8%
 
ValueCountFrequency (%) 
150.2%
 
1.0220.1%
 
1.0221< 0.1%
 
1.02420.1%
 
1.0281< 0.1%
 
ValueCountFrequency (%) 
7.21< 0.1%
 
7.11< 0.1%
 
7.061< 0.1%
 
730.1%
 
6.920.1%
 

Oncotree Code
Categorical

HIGH CORRELATION

Distinct8
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size19.6 KiB
IDC
1865 
MDLC
269 
ILC
192 
BRCA
 
133
IMMC
 
25
Other values (3)
 
25
ValueCountFrequency (%) 
IDC186574.3%
 
MDLC26910.7%
 
ILC1927.7%
 
BRCA1335.3%
 
IMMC251.0%
 
BREAST210.8%
 
PBS20.1%
 
MBC20.1%
 
2022-07-02T14:54:44.924956image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-07-02T14:54:45.026953image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:45.162956image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length6
Median length3
Mean length3.195296931
Min length3

Overall Survival (Months)
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct1743
Distinct (%)88.0%
Missing528
Missing (%)21.0%
Infinite0
Infinite (%)0.0%
Mean125.2442706
Minimum0
Maximum355.2
Zeros1
Zeros (%)< 0.1%
Memory size19.6 KiB
2022-07-02T14:54:45.316956image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile19.1
Q160.86666667
median116.4666667
Q3185.1333333
95-th percentile259.9666667
Maximum355.2
Range355.2
Interquartile range (IQR)124.2666666

Descriptive statistics

Standard deviation76.11177154
Coefficient of variation (CV)0.6077066136
Kurtosis-0.7891674916
Mean125.2442706
Median Absolute Deviation (MAD)61.0666667
Skewness0.373146439
Sum248108.9
Variance5793.001767
MonotocityNot monotonic
2022-07-02T14:54:45.485954image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
192.240.2%
 
38.0333333330.1%
 
48.4333333330.1%
 
128.366666730.1%
 
119.466666730.1%
 
96.9666666730.1%
 
187.033333330.1%
 
117.666666730.1%
 
98.730.1%
 
150.630.1%
 
Other values (1733)195077.7%
 
(Missing)52821.0%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.11< 0.1%
 
0.7666666671< 0.1%
 
1.2333333331< 0.1%
 
1.2666666671< 0.1%
 
ValueCountFrequency (%) 
355.21< 0.1%
 
3511< 0.1%
 
337.03333331< 0.1%
 
335.73333331< 0.1%
 
335.61< 0.1%
 

Overall Survival Status
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)0.1%
Missing528
Missing (%)21.0%
Memory size19.6 KiB
Deceased
1144 
Living
837 
ValueCountFrequency (%) 
Deceased114445.6%
 
Living83733.4%
 
(Missing)52821.0%
 
2022-07-02T14:54:45.647956image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-07-02T14:54:45.735956image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:45.822953image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length8
Median length6
Mean length6.280589876
Min length3

PR Status
Categorical

MISSING

Distinct2
Distinct (%)0.1%
Missing529
Missing (%)21.1%
Memory size19.6 KiB
Positive
1040 
Negative
940 
ValueCountFrequency (%) 
Positive104041.5%
 
Negative94037.5%
 
(Missing)52921.1%
 
2022-07-02T14:54:45.946953image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-07-02T14:54:46.044954image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:46.133306image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length8
Median length8
Mean length6.945795138
Min length3

Radio Therapy
Boolean

MISSING

Distinct2
Distinct (%)0.1%
Missing529
Missing (%)21.1%
Memory size19.6 KiB
Yes
1173 
No
807 
(Missing)
529 
ValueCountFrequency (%) 
Yes117346.8%
 
No80732.2%
 
(Missing)52921.1%
 
2022-07-02T14:54:46.222271image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Relapse Free Status (Months)
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct1972
Distinct (%)82.6%
Missing121
Missing (%)4.8%
Infinite0
Infinite (%)0.0%
Mean108.8424874
Minimum0
Maximum384.21
Zeros4
Zeros (%)0.2%
Memory size19.6 KiB
2022-07-02T14:54:46.332308image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile10.56
Q140.56
median99.095
Q3167.64
95-th percentile249.705
Maximum384.21
Range384.21
Interquartile range (IQR)127.08

Descriptive statistics

Standard deviation76.51949386
Coefficient of variation (CV)0.7030296318
Kurtosis-0.6060913652
Mean108.8424874
Median Absolute Deviation (MAD)62.335
Skewness0.5204076728
Sum259915.86
Variance5855.232941
MonotocityNot monotonic
2022-07-02T14:54:46.517272image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
27.6350.2%
 
189.4750.2%
 
040.2%
 
190.4640.2%
 
32.1140.2%
 
189.6740.2%
 
160.2340.2%
 
79.9340.2%
 
100.6640.2%
 
42.2440.2%
 
Other values (1962)234693.5%
 
(Missing)1214.8%
 
ValueCountFrequency (%) 
040.2%
 
0.031< 0.1%
 
0.11< 0.1%
 
0.331< 0.1%
 
0.361< 0.1%
 
ValueCountFrequency (%) 
384.211< 0.1%
 
370.031< 0.1%
 
359.081< 0.1%
 
346.381< 0.1%
 
332.931< 0.1%
 
Distinct2
Distinct (%)0.1%
Missing21
Missing (%)0.8%
Memory size19.6 KiB
Not Recurred
1486 
Recurred
1002 
ValueCountFrequency (%) 
Not Recurred148659.2%
 
Recurred100239.9%
 
(Missing)210.8%
 
2022-07-02T14:54:46.687276image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-07-02T14:54:46.772276image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:46.860272image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length12
Median length12
Mean length10.327222
Min length3

Sex
Categorical

CONSTANT
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size19.6 KiB
Female
2509 
ValueCountFrequency (%) 
Female2509100.0%
 
2022-07-02T14:54:46.978272image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-07-02T14:54:47.061274image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:47.134275image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length6
Median length6
Mean length6
Min length6

3-Gene classifier subtype
Categorical

MISSING

Distinct4
Distinct (%)0.2%
Missing745
Missing (%)29.7%
Memory size19.6 KiB
ER+/HER2- Low Prolif
640 
ER+/HER2- High Prolif
617 
ER-/HER2-
309 
HER2+
198 
ValueCountFrequency (%) 
ER+/HER2- Low Prolif64025.5%
 
ER+/HER2- High Prolif61724.6%
 
ER-/HER2-30912.3%
 
HER2+1987.9%
 
(Missing)74529.7%
 
2022-07-02T14:54:47.261273image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-07-02T14:54:47.350273image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:47.468272image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length21
Median length20
Mean length12.65962535
Min length3

Tumor Size
Real number (ℝ≥0)

MISSING

Distinct138
Distinct (%)5.8%
Missing149
Missing (%)5.9%
Infinite0
Infinite (%)0.0%
Mean26.22009322
Minimum1
Maximum182
Zeros0
Zeros (%)0.0%
Memory size19.6 KiB
2022-07-02T14:54:47.623273image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile11
Q117
median22.41
Q330
95-th percentile52
Maximum182
Range181
Interquartile range (IQR)13

Descriptive statistics

Standard deviation15.37088305
Coefficient of variation (CV)0.5862253394
Kurtosis17.8940317
Mean26.22009322
Median Absolute Deviation (MAD)7.41
Skewness3.032791591
Sum61879.42
Variance236.2640457
MonotocityNot monotonic
2022-07-02T14:54:47.816275image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
2026910.7%
 
252048.1%
 
151867.4%
 
301857.4%
 
18863.4%
 
35833.3%
 
22823.3%
 
40803.2%
 
16702.8%
 
17672.7%
 
Other values (128)104841.8%
 
(Missing)1495.9%
 
ValueCountFrequency (%) 
1100.4%
 
1.71< 0.1%
 
240.2%
 
2.121< 0.1%
 
2.31< 0.1%
 
ValueCountFrequency (%) 
1821< 0.1%
 
1801< 0.1%
 
1601< 0.1%
 
1501< 0.1%
 
13030.1%
 

Tumor Stage
Real number (ℝ≥0)

MISSING

Distinct5
Distinct (%)0.3%
Missing721
Missing (%)28.7%
Infinite0
Infinite (%)0.0%
Mean1.713646532
Minimum0
Maximum4
Zeros24
Zeros (%)1.0%
Memory size19.6 KiB
2022-07-02T14:54:47.967273image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median2
Q32
95-th percentile3
Maximum4
Range4
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.6553072149
Coefficient of variation (CV)0.3824051241
Kurtosis0.2583928523
Mean1.713646532
Median Absolute Deviation (MAD)0
Skewness0.2214062069
Sum3064
Variance0.4294275459
MonotocityNot monotonic
2022-07-02T14:54:48.078276image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=5)
ValueCountFrequency (%) 
297939.0%
 
163025.1%
 
31445.7%
 
0241.0%
 
4110.4%
 
(Missing)72128.7%
 
ValueCountFrequency (%) 
0241.0%
 
163025.1%
 
297939.0%
 
31445.7%
 
4110.4%
 
ValueCountFrequency (%) 
4110.4%
 
31445.7%
 
297939.0%
 
163025.1%
 
0241.0%
 

Patient's Vital Status
Categorical

HIGH CORRELATION
MISSING

Distinct3
Distinct (%)0.2%
Missing529
Missing (%)21.1%
Memory size19.6 KiB
Living
837 
Died of Disease
646 
Died of Other Causes
497 
ValueCountFrequency (%) 
Living83733.4%
 
Died of Disease64625.7%
 
Died of Other Causes49719.8%
 
(Missing)52921.1%
 
2022-07-02T14:54:48.216276image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-07-02T14:54:48.314308image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:48.438276image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length20
Median length6
Mean length10.45795138
Min length3

Interactions

2022-07-02T14:54:18.604361image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:18.823586image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:18.983683image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:19.133748image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:19.292830image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:19.459040image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:19.632490image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:19.795241image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:19.951709image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:20.112775image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:20.281016image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:20.439165image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:20.601822image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:20.754125image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:20.907338image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:21.129336image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:21.368028image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:21.538024image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:21.889026image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:22.231024image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:22.507389image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:22.666390image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:22.824429image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:23.094390image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:23.256390image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:23.404437image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:23.563396image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:23.851390image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:24.001389image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:24.142389image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:24.299396image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:24.457391image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:24.696389image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:24.844422image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:24.998389image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:25.142391image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:25.285396image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:25.441392image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:25.590389image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:25.730435image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:25.869392image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:26.128388image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:26.288392image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:26.438396image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:26.603422image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:26.767389image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:26.937393image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:27.241386image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:27.409390image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:27.559391image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:27.711392image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:28.018427image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:28.176439image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:28.340390image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:28.501392image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:28.657391image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:28.810390image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:28.953426image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:29.094438image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:29.245398image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:29.520390image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:29.677392image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:29.831388image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:30.056426image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:30.280386image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:30.437392image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:30.583421image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:30.722437image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:30.874391image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:31.027390image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:31.182422image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:31.352390image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:31.517392image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:31.684389image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:31.851393image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:32.011393image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:32.155388image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:32.373386image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:32.597393image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:32.753435image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:32.916422image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-07-02T14:54:48.604306image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-07-02T14:54:49.338309image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-07-02T14:54:49.605276image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-07-02T14:54:50.118788image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2022-07-02T14:54:51.126786image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2022-07-02T14:54:33.392392image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:35.321331image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:36.293083image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-07-02T14:54:37.395560image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Sample

First rows

Patient IDAge at DiagnosisType of Breast SurgeryCancer TypeCancer Type DetailedCellularityChemotherapyPam50 + Claudin-low subtypeCohortER status measured by IHCER StatusNeoplasm Histologic GradeHER2 status measured by SNP6HER2 StatusTumor Other Histologic SubtypeHormone TherapyInferred Menopausal StateIntegrative ClusterPrimary Tumor LateralityLymph nodes examined positiveMutation CountNottingham prognostic indexOncotree CodeOverall Survival (Months)Overall Survival StatusPR StatusRadio TherapyRelapse Free Status (Months)Relapse Free StatusSex3-Gene classifier subtypeTumor SizeTumor StagePatient's Vital Status
0MB-000075.65MastectomyBreast CancerBreast Invasive Ductal CarcinomaNaNNoclaudin-low1.0PositvePositive3.0NeutralNegativeDuctal/NSTYesPost4ER+Right10.0NaN6.044IDC140.500000LivingNegativeYes138.65Not RecurredFemaleER-/HER2-22.02.0Living
1MB-000243.19Breast ConservingBreast CancerBreast Invasive Ductal CarcinomaHighNoLumA1.0PositvePositive3.0NeutralNegativeDuctal/NSTYesPre4ER+Right0.02.04.020IDC84.633333LivingPositiveYes83.52Not RecurredFemaleER+/HER2- High Prolif10.01.0Living
2MB-000548.87MastectomyBreast CancerBreast Invasive Ductal CarcinomaHighYesLumB1.0PositvePositive2.0NeutralNegativeDuctal/NSTYesPre3Right1.02.04.030IDC163.700000DeceasedPositiveNo151.28RecurredFemaleNaN15.02.0Died of Disease
3MB-000647.68MastectomyBreast CancerBreast Mixed Ductal and Lobular CarcinomaModerateYesLumB1.0PositvePositive2.0NeutralNegativeMixedYesPre9Right3.01.04.050MDLC164.933333LivingPositiveYes162.76Not RecurredFemaleNaN25.02.0Living
4MB-000876.97MastectomyBreast CancerBreast Mixed Ductal and Lobular CarcinomaHighYesLumB1.0PositvePositive3.0NeutralNegativeMixedYesPost9Right8.02.06.080MDLC41.366667DeceasedPositiveYes18.55RecurredFemaleER+/HER2- High Prolif40.02.0Died of Disease
5MB-001078.77MastectomyBreast CancerBreast Invasive Ductal CarcinomaModerateNoLumB1.0PositvePositive3.0NeutralNegativeDuctal/NSTYesPost7Left0.04.04.062IDC7.800000DeceasedPositiveYes2.89RecurredFemaleER+/HER2- High Prolif31.04.0Died of Disease
6MB-001456.45Breast ConservingBreast CancerBreast Invasive Ductal CarcinomaModerateYesLumB1.0PositvePositive2.0LossNegativeDuctal/NSTYesPost3Right1.04.04.020IDC164.333333LivingPositiveYes162.17Not RecurredFemaleNaN10.02.0Living
7MB-002070.00MastectomyBreast CancerBreast Invasive Lobular CarcinomaHighYesNormal1.0NegativeNegative3.0NeutralNegativeLobularNoPost4ER-LeftNaNNaN6.130ILC22.400000DeceasedNegativeYes11.74RecurredFemaleER-/HER2-65.03.0Died of Disease
8MB-002289.08Breast ConservingBreast CancerBreast Mixed Ductal and Lobular CarcinomaModerateNoclaudin-low1.0PositvePositive2.0NeutralNegativeMixedYesPost3Left1.01.04.058MDLC99.533333DeceasedNegativeYes98.22Not RecurredFemaleNaN29.02.0Died of Other Causes
9MB-002576.24NaNBreast CancerBreast Invasive Ductal CarcinomaNaNNaNNaN1.0PositvePositive3.0NaNNaNDuctal/NSTNaNNaNNaNNaN11.05.06.680IDCNaNNaNNaNNaN126.32RecurredFemaleNaN34.02.0NaN

Last rows

Patient IDAge at DiagnosisType of Breast SurgeryCancer TypeCancer Type DetailedCellularityChemotherapyPam50 + Claudin-low subtypeCohortER status measured by IHCER StatusNeoplasm Histologic GradeHER2 status measured by SNP6HER2 StatusTumor Other Histologic SubtypeHormone TherapyInferred Menopausal StateIntegrative ClusterPrimary Tumor LateralityLymph nodes examined positiveMutation CountNottingham prognostic indexOncotree CodeOverall Survival (Months)Overall Survival StatusPR StatusRadio TherapyRelapse Free Status (Months)Relapse Free StatusSex3-Gene classifier subtypeTumor SizeTumor StagePatient's Vital Status
2499MTS-T242376.22NaNBreast CancerInvasive Breast CarcinomaNaNNaNNaN1.0NegativeNegative3.0NaNNaNNaNNaNNaNNaNNaN0.08.04.62BRCANaNNaNNaNNaN46.88RecurredFemaleNaN31.02.0NaN
2500MTS-T242469.00NaNBreast CancerBreast Invasive Ductal CarcinomaNaNNaNNaN9.0NegativeNegative3.0NaNNaNDuctal/NSTNaNNaNNaNNaN2.03.0NaNIDCNaNNaNNaNNaN55.63RecurredFemaleNaN28.02.0NaN
2501MTS-T242572.00NaNBreast CancerBreast Invasive Ductal CarcinomaNaNNaNNaN9.0NegativeNegative3.0NaNNaNDuctal/NSTNaNNaNNaNNaN0.0NaNNaNIDCNaNNaNNaNNaN54.90Not RecurredFemaleNaN23.0NaNNaN
2502MTS-T242664.35NaNBreast CancerInvasive Breast CarcinomaNaNNaNNaN1.0PositvePositive3.0NaNNaNNaNNaNNaNNaNNaN0.02.04.38BRCANaNNaNNaNNaN72.76Not RecurredFemaleNaN19.02.0NaN
2503MTS-T242767.58NaNBreast CancerInvasive Breast CarcinomaNaNNaNNaN1.0PositvePositive1.0NaNNaNNaNNaNNaNNaNNaN0.0NaN2.18BRCANaNNaNNaNNaN62.43Not RecurredFemaleNaN9.01.0NaN
2504MTS-T242870.05NaNBreast CancerInvasive Breast CarcinomaNaNNaNNaN1.0PositvePositive1.0NaNNaNNaNNaNNaNNaNNaN0.02.02.54BRCANaNNaNNaNNaN4.93RecurredFemaleNaN27.01.0NaN
2505MTS-T242963.60NaNBreast CancerInvasive Breast CarcinomaNaNNaNNaN1.0PositvePositive2.0NaNNaNNaNNaNNaNNaNNaN0.04.04.56BRCANaNNaNNaNNaN16.18RecurredFemaleNaN28.02.0NaN
2506MTS-T2430NaNNaNBreast CancerInvasive Breast CarcinomaNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.06.0NaNBRCANaNNaNNaNNaNNaNNaNFemaleNaNNaN0.0NaN
2507MTS-T2431NaNNaNBreast CancerInvasive Breast CarcinomaNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.07.0NaNBRCANaNNaNNaNNaNNaNNaNFemaleNaNNaN0.0NaN
2508MTS-T2432NaNNaNBreast CancerInvasive Breast CarcinomaNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.05.0NaNBRCANaNNaNNaNNaNNaNNaNFemaleNaNNaN0.0NaN